Loss function for target detection

The function of the loss function is to measure the distance between the predicted information of the neural network and the expected information (label). The closer the predicted information is to the expected information, the smaller the value of the loss function.
In the field of target detection, common losses are divided into classification loss and regression loss.

L1 loss

L1 Loss, also known as mean absolute error (MAE), refers to the average value of the absolute difference between the model's predicted value f(x) and the real value y. The formula is as follows: Advantages: The derivative of the L1 loss function is constant
insert image description here
and
stable Gradient, so there will be no problem of gradient explosion.
Disadvantages:
When it is between -1 and 1, because its gradient is still 1 or -1, that is, the gradient has no transformation, and in this interval, the error is very small, so we hope that the gradient in this part should be smaller , come slowly approaching.

L2 loss

L2 Loss, also known as mean square error (MSE), refers to the average value of the square of the difference between the model's predicted value f(x) and the actual value y. The formula is as follows:

insert image description here
Advantages:
The function curve is continuous and can be guided everywhere. As the error value decreases, the gradient also decreases, which is conducive to converging to the minimum value.
Disadvantages:
When the error is large, because its derivative is 2x, the gradient is large during this period, and it is very sensitive to outliers, that is, the outliers are not stable enough (robustness is not strong), and the following figure is too sensitive to outliers .

insert image description here
Comprehensive consideration, we hope that the loss function can be relatively smooth and slowly approached when the error is small, that is, the gradient should be reduced, and when the error is large, we hope that it can be stable and robust, so we propose the L1 loss and L2 A variant of the loss, Smooth L1.

Smooth L1 loss

Simply put, Smooth L1 is a smooth version of L1 Loss, and its formula is as follows:

insert image description here

This function is actually a piecewise function. It is the L2 loss between [-1,1], which solves the inflection point of L1 at 0, and the L1 loss outside the [-1, 1] interval, and solves the gradient of outliers. Explosion problem, so the gradient can be restricted from the following two aspects:

  • When the error between the predicted value and the real value is too large, the gradient value will not be too large;
  • When the error between the predicted value and the real value is small, the gradient value is small enough.

The following picture shows the synthesis of the three:

insert image description here

IOU loss

IoU is what we call the intersection ratio, which is the most commonly used indicator in target detection. In the anchor-based method, its role is not only used to determine positive samples and negative samples, but also used to evaluate the output box (predict box ) and ground-truth distance.
insert image description here

Advantages:
It can be said that it can reflect the detection effect of the predicted detection frame and the real detection frame.
Another good feature is scale invariance, that is, scale invariant. In the regression task, the most direct indicator for judging the distance between the predict box and gt is IoU.
Disadvantage:
If the two boxes do not intersect, according to the definition, IoU=0, which cannot reflect the distance between the two (coincidence). At the same time, because loss=0, there is no gradient feedback, and learning and training cannot be performed.
IoU cannot accurately reflect the degree of coincidence between the two. As shown in the figure below, the IoUs in the three cases are equal, but it can be seen that their coincidence degrees are different. The regression effect of the left picture is the best, and the right picture is the worst.
insert image description here

def box_iou_pairwise(boxes1, boxes2):
    area1 = box_area(boxes1)
    area2 = box_area(boxes2)

    lt = torch.max(boxes1[:, :2], boxes2[:, :2])  # [N,2]
    rb = torch.min(boxes1[:, 2:], boxes2[:, 2:])  # [N,2]

    wh = (rb - lt).clamp(min=0)  # [N,2]
    inter = wh[:, 0] * wh[:, 1]  # [N]

    union = area1 + area2 - inter

    iou = inter / union
    return iou, union

insert image description here

GIOU loss

insert image description here
First calculate the minimum closure region area A c of the two frames (popular understanding: the area of ​​the smallest frame that includes both the predicted frame and the real frame), then calculate the IoU, and then calculate the area of ​​the closure area that does not belong to the two frames The proportion of the closure area, and finally subtract this proportion from IoU to get GIoU.

insert image description here

Pros:
Cons:
When

def generalized_box_iou(boxes1, boxes2):
    """
    Generalized IoU from https://giou.stanford.edu/

    The boxes should be in [x0, y0, x1, y1] format

    Returns a [N, M] pairwise matrix, where N = len(boxes1)
    and M = len(boxes2)
    """
    # degenerate boxes gives inf / nan results
    # so do an early check
    assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
    assert (boxes2[:, 2:] >= boxes2[:, :2]).all()

    iou, union = box_iou(boxes1, boxes2)

    lt = torch.min(boxes1[:, None, :2], boxes2[:, :2])
    rb = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])

    wh = (rb - lt).clamp(min=0)  # [N,M,2]
    area = wh[:, :, 0] * wh[:, :, 1]

    return iou - (area - union) / (area + 1e-6)

DIOU loss

The problem is as follows: At this time, the loss of iou and giou is as large. At this time, GIOU also degenerates into IOU
insert image description here
and proposes DIOU
insert image description here

insert image description here

In the above loss function, b and bgt represent the center points of the anchor frame and the target frame respectively, and p represents the calculation of the Euclidean distance between the two center points. c represents the diagonal distance of the smallest rectangle that can cover the anchor and the target box at the same time.

def Diou(bboxes1, bboxes2):
    rows = bboxes1.shape[0]
    cols = bboxes2.shape[0]
    dious = torch.zeros((rows, cols))
    if rows * cols == 0:  #
        return dious
    exchange = False
    if bboxes1.shape[0] > bboxes2.shape[0]:
        bboxes1, bboxes2 = bboxes2, bboxes1
        dious = torch.zeros((cols, rows))
        exchange = True
    # #xmin,ymin,xmax,ymax->[:,0],[:,1],[:,2],[:,3]
    w1 = bboxes1[:, 2] - bboxes1[:, 0]
    h1 = bboxes1[:, 3] - bboxes1[:, 1]
    w2 = bboxes2[:, 2] - bboxes2[:, 0]
    h2 = bboxes2[:, 3] - bboxes2[:, 1]

    area1 = w1 * h1
    area2 = w2 * h2

    center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2
    center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2
    center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2
    center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2

    inter_max_xy = torch.min(bboxes1[:, 2:], bboxes2[:, 2:])
    inter_min_xy = torch.max(bboxes1[:, :2], bboxes2[:, :2])
    out_max_xy = torch.max(bboxes1[:, 2:], bboxes2[:, 2:])
    out_min_xy = torch.min(bboxes1[:, :2], bboxes2[:, :2])

    inter = torch.clamp((inter_max_xy - inter_min_xy), min=0)
    inter_area = inter[:, 0] * inter[:, 1]
    inter_diag = (center_x2 - center_x1) ** 2 + (center_y2 - center_y1) ** 2
    outer = torch.clamp((out_max_xy - out_min_xy), min=0)
    outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2)
    union = area1 + area2 - inter_area
    dious = inter_area / union - (inter_diag) / outer_diag
    dious = torch.clamp(dious, min=-1.0, max=1.0)
    if exchange:
        dious = dious.T
    return dious

CIOU loss

The problem is as follows: Since the aspect ratio is not considered, the loss value is the same
insert image description here

Therefore, CIOU is proposed. CIOU introduces the aspect ratio, namely v, and adds a dynamic weight value a before v. The larger it is, the more attention is paid to the aspect ratio. If it wants to become larger, the iou should be larger.
insert image description here

Among them, the previous constant item in the calculation method of v is used as an empirical value. In addition, other functions can be used instead of arctan. The intention is that the smaller the difference in aspect ratio between the two, the smaller the loss value.
As shown in the right picture below, because the iou is small, we will definitely not consider the aspect ratio, because the aspect ratio of the two pictures is the same, then the smaller a should be at this time, and the focus should be on how to make the width and height larger On the left picture, its iou is larger, that is, it is already similar in size, then we need to pay attention to its aspect ratio, and pay attention to its shape.

insert image description here

def box_ciou(b1, b2):
    """
    输入为:
    ----------
    b1: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
    b2: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh

    返回为:
    -------
    ciou: tensor, shape=(batch, feat_w, feat_h, anchor_num, 1)
    """
    # 求出预测框左上角右下角
    b1_xy = b1[..., :2]
    b1_wh = b1[..., 2:4]
    b1_wh_half = b1_wh / 2.
    b1_mins = b1_xy - b1_wh_half
    b1_maxes = b1_xy + b1_wh_half

    # 求出真实框左上角右下角
    b2_xy = b2[..., :2]
    b2_wh = b2[..., 2:4]
    b2_wh_half = b2_wh / 2.
    b2_mins = b2_xy - b2_wh_half
    b2_maxes = b2_xy + b2_wh_half

    # 求真实框和预测框所有的iou
    intersect_mins = torch.max(b1_mins, b2_mins)
    intersect_maxes = torch.min(b1_maxes, b2_maxes)
    intersect_wh = torch.max(intersect_maxes - intersect_mins, torch.zeros_like(intersect_maxes))
    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
    b1_area = b1_wh[..., 0] * b1_wh[..., 1]
    b2_area = b2_wh[..., 0] * b2_wh[..., 1]
    union_area = b1_area + b2_area - intersect_area
    iou = intersect_area / torch.clamp(union_area, min=1e-6)

    # 计算中心的差距
    center_distance = torch.sum(torch.pow((b1_xy - b2_xy), 2), axis=-1)

    # 找到包裹两个框的最小框的左上角和右下角
    enclose_mins = torch.min(b1_mins, b2_mins)
    enclose_maxes = torch.max(b1_maxes, b2_maxes)
    enclose_wh = torch.max(enclose_maxes - enclose_mins, torch.zeros_like(intersect_maxes))

    # 计算对角线距离
    enclose_diagonal = torch.sum(torch.pow(enclose_wh, 2), axis=-1)
    ciou = iou - 1.0 * (center_distance) / torch.clamp(enclose_diagonal, min=1e-6)

    v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(b1_wh[..., 0] / torch.clamp(b1_wh[..., 1], min=1e-6)) - torch.atan(
        b2_wh[..., 0] / torch.clamp(b2_wh[..., 1], min=1e-6))), 2)
    alpha = v / torch.clamp((1.0 - iou + v), min=1e-6)
    ciou = ciou - alpha * v
    return ciou

Alpha IOU

insert image description here

Why is Alpha IOU useful?

insert image description here

Guess you like

Origin blog.csdn.net/pengxiang1998/article/details/130371820